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Abstract —In a cascading power transmission outage, com¬ 
ponent outages propagate non-locally; after one component 
outages, the next failure may be very distant, both topologically 
and geographically. As a result, simple models of topological 
contagion do not accurately represent the propagation of cascades 
in power systems. However, cascading power outages do follow 
patterns, some of which are useful in understanding and reducing 
blackout risk. This paper describes a method by which the data 
from many cascading failure simulations can be transformed 
into a graph-based model of influences that provides actionable 
information about the many ways that cascades propagate in 
a particular system. The resulting “influence graph” model is 
Markovian, in that component outage probabilities depend only 
on the outages that occurred in the prior generation. To validate 
the model we compare the distribution of cascade sizes resulting 
from n — 2 contingencies in a 2896 branch test case to cascade 
sizes in the influence graph. The two distributions are remarkably 
similar. In addition, we derive an equation with which one can 
quickly identify modifications to the proposed system that will 
substantially reduce cascade propagation. With this equation one 
can quickly identify critical components that can be improved to 
substantially reduce the risk of large cascading blackouts. 

Index Terms —Cascading failure, big data, complex networks 

I. Introduction 

P OWER systems are generally robust to small disturbances, 
but unexpected combinations of failures sometimes ini¬ 
tiate long chains of cascading outages, which can result in 
massive and costly blackouts. Because of the low-probability 
high-impact nature of cascading failures, there is limited 
empirical data from which to understand the many ways that 
cascades propagate through a power system. Simulations of 
cascading mechanisms can produce data, but there has been 
insufficient progress in extracting insights and useful statistical 
information from these data. 

One approach to obtaining useful information is to try 
and leverage successes from network science, which has an 
extensive literature on cascading failure and (more generally) 
contagion. Models of topological contagion in which failures 
spread locally from a node to its immediate neighbors (T), 
[21, (3) have provided insight into a variety of problems, such 
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as disease propagation |4), [5j. Similar topological contagion 
models have been suggested as a tool for power systems 
analysis (e.g., 0, Q). 

This approach has two problems. First, in power grids it is 
more appropriate to focus first on line (edge) outages, since 
line outages are an order of magnitude more likely than node 
(substation) outages. More importantly, as is well known to 
power systems engineers, cascading outages in power systems 
propagate non-locally as well as locally. In a power grid, the 
next component to fail after a particular line outages may be 
very distant, both geographically and topologically 0. This 
can be seen very clearly from the sequence of events in the 
Western US blackout of 1996, shown in Fig. [l] in which the 
failure sequence jumps across long distances at several points 
in the cascade. Cascades in power networks are more similar 
to the random fuse model m from statistical physics, in which 
non-local propagation does occur. 



Fig. 1. An illustration of the event sequence for the Western US blackout on 
July 2, 1996 (from 1 10)). The sequence jumps across hundreds of kilometers 
at several points, such as from (3) - @ and from © - (8) . 

On the other hand, simulation models that capture as¬ 
pects of detailed power grid physics and engineering are 
useful for understanding how cascades propagate. Impor¬ 
tantly these models do show the non-local propagation that 
is apparent in historical cascades. Examples of physics- 
based models of cascading include quasi-steady state models, 
such as DCSIMSEP E). (HI OPA El. G33. Manchester 
model JT51 . and TRELSS E), and dynamic models, such as 
COSMIC ifTTl and the hybrid model in [18]. While engineer¬ 
ing models are useful, the data that result from this type of 
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simulation are complicated and difficult to summarize, even 
for a single cascading failure simulation. In order to obtain 
useful statistical information, thousands, or even millions of 
simulations are often required, resulting in enormous sets of 
complicated output data. To summarize and understand big 
data from detailed simulations, there is a need for higher-level 
models of cascading in large-scale systems that are simple 
enough to provide statistical insight, without abstracting away 
physical details in a way that could lead to erroneous con¬ 
clusions. Only a few such statistical models, which do not 
neglect the non-local nature of cascading, have been proposed 
in the literature. Reference CD finds critical clusters of lines 
in simulated cascade data using a synchronization matrix, 
which determines the critical clusters as sets of lines that 
frequently overload in the same cascade and in a cascade that 
leads to a large blackout. This approach does not consider the 
order in which the lines overload during the cascade, but does 
indicate combinations of critical lines that are associated with 
blackouts. The general idea of building an influence graph 
describing ways that component failures might spread goes 
back to (201, ED- 

This form of influence graph couples Markov processes 
at each node, but it remains unknown how to construct 
these influence graphs from data. Stochastic models in [22], 
(23l use Markov-chains to represent key characteristics of 
simulated cascades. In prior work, the authors proposed a 
“line-interaction graph” approach to modeling cascades from 
data ED- A matrix-based approach, which is in some ways 
similar to ours, was used to describe the probability of a 
cascade propagating from one network node to another in [25]. 
Ref. lf25l is primarily motivated by avalanches of neurons 
firing, but also mentions blackouts, and analyzes the the 
statistics of the number of generations in a cascade as well 
as the number of failures. One difference is that [25 ] restores 
nodes in the generation after they fail, which allows cascades 
to propagate through previously failed nodes and prolongs the 
cascades. The ideas in 1241 were extended in (26) to build an 
interaction model of cascading that can reproduce statistical 
properties of simulated cascades. 

This paper presents a new influence graph method, also 
based on the concepts in (24), in which we take large amounts 
of data from cascade simulations and synthesize the data into 
a Markovian network model. Importantly, the resulting model 
has a network structure through which cascades propagate 
locally, but that network structure is dramatically different 
from that of the original power network. That is, the outages 
propagate along the influence graph, which is completely 
different than the graph topology of the grid. We emphasize 
that any cascading failure simulation based on engineering 
principles can be used to produce the data needed to synthesize 
the influence graph. 

II. Method: Statistical modeling of non-local 
cascades 

A cascading failure begins with one or more initiating 
events, typically component outages. For example a tree falls 
into a transmission line, or an operator error results in one 


or more component outages. Each initiating event will perturb 
the state of the network, which may result in excessive stress 
on other components. Because of the laws of power flow, 
these stressed components may be topologically distant from 
the initiating events. Excessive stress may cause one or more 
dependent outages, which may subsequently cause additional 
stress and additional outages. Together this sequence is a 
cascading failure. 

Prior work ED has shown that one can gain substantial 
insight into the statistical properties of cascading failure by 
grouping sequences of component outages into generations, 
and looking at the growth (or propagation) rates among 
generations. Typically, the first generation represents the ex¬ 
ogenously caused initiating events. Subsequent generations 
can be thought of as “children” of prior “parent” outage 
generations. In this branching process model, each generation 
of failures produces some number of dependent failures and 
the rate at which prior generation (parent) outages cause new 
(child) outages is known as the propagation rate, A. If \Zq\ 
is the total number of outages (over many cascades) in the 
set of all initiating events over many cascades and \Zi\ is the 
number of outages in the first dependent generation, then the 
propagation rate from generation zero into the first generation 
is: Aq = \Zi\/\Zq\. More generally: 


However, the approach of 127 ) simply counts outages with¬ 
out discriminating which component outages will result from a 
particular prior outage, or how the components relate to other 
components. It is clear that component outages vary in their 
frequency and impact on other components. For example, the 
outage of a large transmission line carrying a large amount of 
power is likely to result in more dependent outages than the 
failure of a smaller line. Thus, it is useful to consider different 
components as having different propagation rates. With this 
in mind, our model assumes that each component i produces 
a random number, of child outages according to the 

following conditional probability function: 

f[k\i,m\ = Pr [k outages in generation m- hi, given 

a single outage of i in generation m] (2) 

In this paper we assume that f[k\i, m] is a Poisson distribution, 
with A i :Tn representing the mean number of outages propagated 
by the outage of i in generation(s) m. 

Finally, if component i outages and causes a dependent 
outage in the next generation, some components are more 
likely to outage in the next generation than others. For the case 
of line outages, this increased likelihood comes from a number 
of factors including the way in which currents are redistributed 
(which can be estimated from line outage distribution factors) 
and the proximity of particular components to their tripping 
threshold. Therefore, it is important to model the conditional 
probability of component j failing, given the failure of i. Let: 

g[j\i, mn\ = Pr[j fails in generation m + 1 given 

a single outage of i in generation m 
and one outage in generation m + 1] (3) 
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For each generation m, g[j\i , m\ can be considered as the i, j 
element of a matrix that defines a weighted, directed graph of 
influences among components. 

Together / and g form a model, here referred to as an 
influence graph, that is Markovian in the sense that the outage 
probabilities depend only on the outages that occurred in the 
prior generation (see Sec. [Til). 


A. Estimating the parameters for f and g from data 

In this paper we estimate the parameters for / and g from 
data obtained from many cascading failure simulations. While 
it may be possible in principle to design a method to estimate / 
and g directly from engineering information about the system, 
doing so would require extensive knowledge and difficult 
assumptions about how cascades propagate. By estimating the 
model’s parameters from data our method can be applied to 
any cascading simulation or model, so long as there is a 
discrete set of components that can fail, and these failures 
can be grouped into generations. 

Let us assume then that a trusted engineering simulation 
model of cascading failure exists for a given network at a 
particular state, and that we can perturb this model with many 
random disturbances and thus produce a large amount of 
cascading failure sequence data. The first step in building the 
influence graph is to group the outages from the sequence data 
into generations, typically by dividing the event sequences by 
finding pairs of events that are separated by some amount of 
time (see [281, (27)). After this grouping, we can let zffl 
represent the set of outages in generation m of cascade d, and 
| Z%P | represent the number of outages within this set. 

In order to estimate / one first needs to decide the extent 
to which / will be modeled to depend on the generation m. 
For most of the results in this paper we provide separate 
estimates of / for the initiating generation f[k\i,0\ and for 
the subsequent generations f[k\i, 1+]. The rationale for this 
is that the n — 1 security criterion results in networks being 
quite robust to initial outages, but after a cascade has already 
started outages propagate with a much higher probability. For 
the simulated cascades used in this paper, this propagation 
rate does not change dramatically among the subsequent 
generations of dependent outages (m > 1), denoted by 1+. 
Thus, we describe the post-initiating-event propagation rates 
using the single (vector) distribution f[k\i, 1+]. 

We assume that f[k\i,m\ follows a Poisson distribution, 
and need to estimate the Poisson parameter (the mean of 
Ki m) for each component i and each generation (or set of 
generations), m. Let P^ m represent the number of times (in 
a set of cascades) that each component i appears as a parent 
outage in generation(s) m, and C* ?m denote the total number of 
“effective children” that result from the outage of component i 
in the respective generation(s). More specifically, if is 
the set of cascades within which i appears as a parent in 
generation(s) m, then: 


P — ID 
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(4) 
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For example, if i fails along with one other component j in 
generation 0 of a particular cascade and generation 1 of that 
same cascade includes three outages, we would count 3/2 = 
1.5 additional “effective children” for each of i and j , and 
thus add 1.5 to G* ? o and Cj 5 o and increment both P^ 0 and 
P J? o- After counting P ijm and C* ?m for all of the cascades 
in a dataset, the Poisson parameter for f[k\i,m] is A= 
Ci,m /Pi,m. 

When defined in this way, the weighted average (over i) 
of each A* ?m is equal to the overall propagation rate for the 
respective generation(s), A m . For the specific case in which 
we estimate two sets of parameters for A^o and A^i + we get 
the following pair of relationships: 


v _ \%i\ _ EILi^,o 
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where M is the maximum generation index m over all of the 
cascades in the dataset. 

To build the graph of inter-component influences, G, we 
again iterate through each cascade d and generation m. For 
each d and m in which i occurs as a parent and j occurs as 
a child in the next generation (m + 1), we add the fraction 
l/\zffi\ to a counter g c [j\i]- Finally, the individual elements 
of G are estimated as follows: 


g[M = 


9 c\M 


( 8 ) 


Normalizing in this way ensures that g[j\i] acts as a con¬ 
ditional probability, as per our definition in ([3]), such that 
J2j=i 9[jV\ = 1- Note that in this paper we assume that g 
does not change with generation m. The reason for this is that 
even with data from many simulated cascades, only a fairly 
limited number of observations are available for most of the 
potential sequence pairs i—>j. 


B. 6-bus illustration 

To illustrate the process of building / and g , this section 
describes the formation of the influence graph for a slightly 
modified version of the Wood and Wollenberg 6-bus test 
case f29l (see Fig. [2]). After removing two transmission lines 
(for graphical clarity), all of the pre-contingency line flows 
were below the rated limits, but the system was not initially 
n — 1 secure. To generate cascading outage data, 1000 sets 
of one or more initiating outages were randomly generated 
assuming that each of the nine transmission lines had an 
equal outage probability of po = 1/100. Dependent outage 
sequences were generated using the DCSIMSEP cascading 
failure model from CD, 03. These sequences were then 
grouped into generations by separating outages that were 
distant in time by at least At m 5 seconds. Note that this is a 
relatively small value for At compared to previous branching 
process applications (e.g., |[27l ). Since the 6-bus case is very 
small, and used here only to illustrate the influence graph 
concept, choosing a small At seemed appropriate. Given data 
separated into generations we computed a single distribution 
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for /, as shown in Table [T] Similarly the parameters for g were 
computed as described in Sec. II-A The results for / and g 
are illustrated in Fig. [3] 

A number of observations result from inspection of / and 
g. From / we see that one of the nine lines (line 2-4) has a 
much higher propagation rate than the other eight, a clear sign 
of its critical importance to this particular system. We also find 
two different common paths of cascading. One path includes 
the set of lines {3-6,5-6,1-2,1 -4} and a second includes {3- 
5,4-5,2-5,2-3}. The outage of 2-4 can propagate a subset of 
either of these two paths. 


TABLE I 

Parameters for / for the 6-bus test case 


Line: 

1-2 

1-4 

2-3 

2-4 

2-5 

3-5 

3-6 

4-5 

5-6 

Pi 

412 

416 

401 

106 

498 

100 

104 

503 

193 

Ci 

208.5 

98.5 

195.5 

401 

298.5 

99 

102 

297.5 

107.5 

A i 

0.51 

0.24 

0.49 

3.78 

0.60 

0.99 

0.98 

0.59 

0.56 



Fig. 2. The modified 6 bus test case with three generators (circles) and 3 
loads (triangles). Line widths indicate absolute line loading (MW) and colors 
indicate precent of rated loading. 



Fig. 3. Illustration of the influence graph for the 6-bus test case. The edge 
weights represent the conditional probability of one line outage propagating 
associated line outages. 

III. Simulating cascading failures given / and g 

Once formed, / and g can be used to rapidly generate many 
synthetic cascades, which have statistical properties similar to 
those of the original data. 

To simulate the influence graph (see Fig. [4]), we start with 
a large set of initiating events. These can be produced by a 
variety of sampling methods, such as Monte-Carlo or complete 
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Fig. 4. Illustration of an influence graph simulation for a small network 
with six components. Initially node 2 fails due to some exogenous cause. 
Based on a draw from / 2 , this failure causes k = 2 dependent failures in 
the next generation. Drawing twice from g[j\i = 2] results in nodes 1 and 
5 failing in the next generation. In generation 1, the failure of node 1 does 
not produce any additional children, but the failure of 3 produces one “child” 
failure (node 3). Finally, the failure of node 3 does not produce any children, 
thus ending the cascade. 


enumeration of a plausible contingency list (e.g., all n — 2’s). 
For Monte-Carlo sampling, if initiating outages are assumed 
to be independent, then this can be a simple vector of failure 
probabilities for each of n components. Given the chosen 
sampling method, the influence graph can be simulated as 
follows: 

1) Initialize the cascade index: d— 1 

2) Initialize the generation index: m = 0 

3) Produce a set of exogenous initiating outages via 
the chosen sampling method. 

4) For each outage i € 4 ?, do the following: (a) Deter¬ 
mine how many child outages n result from i by sampling 
from f[k\i,m\. (b) Determine which outages result from 
outage i by sampling from g[j\i] k times. This sampling 
is done using Bernoulli trials, such that for each trial 
{1... k} component j will fail if g[j\i] > r, where r 
is a uniformly distributed random number 0 < r < 1. 
The result is a set of outages for the next dependent 
generation, Z^ +1 . 

5) If Z^ +1 includes at least one outage, then increment the 
generation index m = m + 1 and continue from Step [4] 

6) Otherwise, increment the cascade counter d = d + 1 and 
continue from Step [2] 

Note that a given component j may be selected for failure 
more than once. Since a component cannot fail multiple times 
(and restoration is not included in this model), this means that 
the total number of new child outages may be slightly less 
than k. 

A. Simulated cascades in a larger test case 

To illustrate the influence graph method, we produced a 
dataset of cascading outages by simulating the impact of each 
n— 2 transmission branch (transformer or line) outage in an n— 
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1 secure version of the winter peak Polish case available with 
MATPOWER (30), using the same simulator (DCSIMSEP) 
and case data as in hd, da. The test case has n = 2896 
branches, which resulted in n(n — l)/2 = 4 191 960 initiating 
n — 2 contingencies, of which 3170 resulted in at least one 
dependent outage. Fig. [5] shows the distribution of cascade 
sizes for this dataset. 



Fig. 5. Probability distribution of cascade sizes, measured by the number 
of branches failed, for the original DCSIMSEP simulation data (empirical), 
and for the influence graph (simulated) data. Data are binned so that each bin 
contains at least 26 observations. 



Finally, we simulated artificial cascades by applying each of 
the 4191 960 possible n — 2 contingencies with the resulting 
influence graph. Figure [5] shows the resulting probability 
distribution of cascade sizes, measured as the total number of 
outages in each cascade (including the initiating outages), for 
the empirical data and the simulated data from the influence 
graph. The influence graph method matches the empirical 
data quite well, with the exception that the influence graph 
does not reproduce the frequency of the very longest cascades 
observed in the empirical data. Otherwise the match between 
the simulated and empirical data is quite good. 

Another way to compare the influence graph data to the 
original data is to look at the frequency with which particular 
components appear in the two datasets. One would expect a 
valid model of the original data to show that component i 
outages with a frequency that is similar to its outage rate in 
the original data. To evaluate if this is indeed the case, Figure [7] 
compares the outage rates (the number of dependent outages 
of i divided by the total number of dependent outages) in the 
simulated data and in the empirical data. The match between 
these two rates provides further evidence for the validity of 
this approach. 



Occurrance rate in empirical data 


Fig. 6. Empirical probability density functions for the component-wise 
propagation rates in the initiating generation, A 7 ; 5 o, and the subsequent 
generations, A^i+. Data are binned as in Fig. [5] 


The outage data were subsequently separated into gen¬ 
erations by assigning all of the initiating outages to the 
first generation and distributing the dependent events into 
subsequent generations by assigning events that were at least 
30 seconds apart CD to different generations. We chose this 
value for At based on the fact that fast transients and auto¬ 
recloser actions are typically completed within 30 seconds. 
We also tested At = 15s and At = 60s, and found that 
the parameters for / were not substantially altered within 
this range. After separating the outage data into generations, 
the parameters for / and g were constructed as described in 
Sec. II-A Figure [6] shows the distribution of values for A^o 
and Notably, both distributions are heavy tailed; for the 

majority of components Ai ?m = 0, but a few components tend 
to produce a much larger number of “child” outages. 


Fig. 7. Comparison of the outage rate of components in the influence-graph 
simulated data and in the original cascading failure data. Circles (o) show 
the rates for all dependent events and x shows only dependent events that 
occurred in generation m = 4 and later. 


IV. Extracting useful information from the 

INFLUENCE GRAPH 

Once the influence graph (/ and g) is built from data the 
results can shed valuable insight into the general properties of 
cascading in a particular system. 

In order to do so, we first combine / and g into a single 
graph H that captures the relative strength of the influences 
among the component outages. Consider that in an arbitrary 
generation m of influence-graph-simulated cascade d, com¬ 
ponent i fails alone (Z^ = {z}) and a random draw from 
f[k\i,m\ indicates that there should be (about) K = k failures 
in the next generation. Then the conditional probability that a 
particular component j fails in the next generation (m + 1), 
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given that i failed in generation m and that generation m + 1 
includes exactly k failures is: 

Pr(j|{*) k}) = 1 — (1 — g\j\i]) k (9) 

Let hij^ m be the conditional probability that a particular 
component j fails in generation m + 1 given that z failed 
in ra, over all values of fc. can be computed from 0 

using the assumption that K is a Poisson random variable: 

n—1 

hij, m = 

k= 0 
oo 

~ E ( x _ “ a[M) k ) f\ k \h m\ 

k =0 
00 

=E( i -( i -«w]) i ) 

k= 0 

— g — ^i,m9[j I*] 

Thus defined, can be thought of as the z, j element of 

a matrix H m that combines / and g. H m has the properties 
of a weighted adjacency matrix for a directed graph. Since, 
for our larger test case, we defined two different distributions 
for f[k\i,m = 0] and f[k\i,m = 1+], we end up with two 
different matrices H 0 and Hi + that, respectively, describe the 
propagation of the initiating contingency and the subsequent 
dependent events. Note that the nodes of the influence graph 
do not represent system states, and thus H differs from a 
typical Markov chain transition matrix. In particular, H is not 
a stochastic matrix because the events comprising one of the 
components outaging after component z outages are neither 
exclusive nor exhaustive; it is routine that no components or 
several components outage after component z outages. 

A. Using H to find critical components 

A particularly important question in the study of cascading 
failure risk is that of identifying critical components, the 
failure of which could result in particularly large cascading 
failures. Or, more practically, finding components that could 
be improved in some way to substantially reduce blackout risk. 
Prior work m , ed has suggested that some components, 
when they fail as a part of a multiple initiating contingency, 
contribute orders of magnitude more to blackout risk, relative 
to the average component. Here we suggest a method for 
using the information contained in H 0 and Hi + to find those 
components that could propagate large cascading failures if 
they fail during a cascade, as opposed to during the initiating 
contingency. This type of information could be useful to 
power system planners in identifying components that should 
be prioritized for more thorough vegetation management or 
upgraded protection systems, or to operators in making adjust¬ 
ments to the dispatch to reduce blackout risk by decreasing 
line loadings. 

We start by defining a vector of independent binary 
(Bernoulli) random variables, so, that describes the space 
of possible initiating contingencies. So is defined such that 
s^o = 1 indicates that component i outages as a part of 


( 10 ) 

(ID 

( 12 ) 

(13) 

(14) 


the initiating contingency and s^o = 0 means that i did not 
initially fail. pi : o is the probability that = 1 and thus £+o 
also is also the expected value of s^q. More generally, let 
represent the probability that z outages in generation ra, and 
p m be the column vector of these probabilities. 

Let us now use H m and the outage probabilities p m to 
find the probability that a particular component j outages in 
generation ra + l. gives us the probability that j outages 

in generation ra + l, given that z outaged in generation ra. 
The probability Pj,m+ l that j outages in generation ra + l 
is the probability of the union of all the ways that j might 
have failed. If the individual interaction probabilities in H are 
small and the ways are independent (or disjoint), then we can 
neglect the higher order probabilities (such as the probability 
that j failed due to the combination of i and fc), and obtain 
the following approximation for the probability that j fails in 
generation ra + 1: 


Pj,m +1 — +Pr [j(m + 1)1 i(m)] Pr [i(m)\ (15) 

i= 1 
n 

— ^ ^ hi,j,mPi,m (16) 

i= 1 

As a result, the outage probabilities in the first and subsequent 
generations can be estimated by simple matrix multiplication: 

Pi = Po H o (17) 

Pm+l = Pm H l+ , m > 1. (18) 

After a long cascade, the probability of each component having 
failed at some point during a cascade (the vector a) can be 
found as follows: 

oo oo 

aT = E Pm = Po T + Po T Ho E H ™+ < 19 ) 

m =0 m =0 

Eq.® will evaluate to a finite quantity, so long as the abso¬ 
lute values of the eigenvalues of Hi + are less than 1, which 
holds for our test cases. Since J^m=o-^i+ = (I — Hi + ) -1 , 
Eq.® can be rewritten more simply as: 

a T = Po+Po H o( I - H i+) _1 (20) 

This is a useful expression as it enables us to quickly under¬ 
stand which components are most likely to be involved in a 
cascading failure. Also, the sum of a gives an estimate of the 
expected size of the cascades that could result from the vector 
of initiating probabilities po. 

Now we turn our attention to using ( [20} to estimate the 
impact of design changes on cascading failure propagation. 
Let us assume that we know that if we upgrade component j 
(perhaps by replacing line j’s distance relays with current 
differential relays, or improving vegetation management on 
its transmission corridor) we can reduce the probability of line 
j outaging due to an overload, in response to other outages 
in the system. We can represent this change by reducing the 
probabilities in the j th column of Ho and Hi + by subtracting 
column perturbation vectors Jq and 5% from Hq and Hi + , 
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respectively. Then the result of the perturbations to Ho and 
Hi + can be estimated by calculating 


a' T = P J 0 + Po(Hq - <5 0 eJ)(I - H 1+ + ^ej) -1 (21) 


where ej is an indicator vector with 1 in the j th element and 
0 elsewhere. 


However, what we really care about, if we want to estimate 
the impact of potential upgrades, is the change in a that results 
from the perturbations Sq and Si, not the absolute value of a'. 
This change is computed by subtracting ( [21} from (20): 


Aa/ = a T —a' T = (22) 

pj (H 0 (I - H 1+ ) -1 - (Ho - d 0 oj)(l - H 1+ + SieJ)- 1 ) 

Applying the Sherman-Morrison formula l(32l for rank-1 up¬ 
dates of an inverted matrix and simplifying leads to: 


Aa/= Po %eJ(I-H 1+ ; 
Po T (Ho - <5oeJ)2 


-1 


•H 1+ )-M ie T(I-H : 


i+ 


(23) 

r 1 


1 + eVl-H 1+ )- 1 «5 1 


Finally we can sum Aa j to define the overall impact ay of 
modifications £o and Si on line j : 


n 

a j = Aa i’ k 

k =1 


(24) 



Fig. 8. Illustration of the combined influence matrix H for the 6 bus case. 
Edge weights show the probability of an outage at the origin node resulting 
in an outage at the destination node. 


These qualitative observations can be made quantitative by 
applying equations ( [20} and ( [24} to H for the 6-bus case. The 
importance of component (2-4) to the system can be observed 
by computing ( [20} after setting po to indicate that only (2-4) 
outages initially, with probability 1. In this case, the expected 
cascade size from ( [20} is a T l = 5.16. An initial outage of the 
other lines in the network is expected to produce much smaller 
cascades. For all of the remaining lines a T l < 2.28. Similarly, 
we can estimate the effect of potential upgrades to this system 
by computing ay in (24). To do so we choose £o = S± j= S 
equal to 1/2 of the j th column of H for each of the 9 lines in 
this network, and set po to a vector of initiating probabilities 
with p 0 ,i = 0.001, Vi. The results (unsurprisingly) tell us 
that modifications to (2-5) and (4-5) will be most beneficial, 


with ck( 2 _ 5 ) = 0.021 and oy 4 _ 5 ) = 0.022. ay < 0.015 for 
the remaining components. Since they have no in-degree (no 
inward links) oti = 0 for (2-4), (3-5) and (3-6). 


By computing ay for each line j from the parameters of the 
influence graph we can quickly estimate the impact of potential 
modifications without the need for extensive simulations of 
these modifications. This metric ( [24} allows us to quickly 
compute, from data, the relative importance (or “criticality”) 
of particular components in a power system. 


B. Critical components in the 6-bus case 

To illustrate the result of combining / and g to produce 
a single influence graph, as defined in ([14}, Fig. [8] shows 
H for the 6-bus test case presented in Sec. ES (For this 
small case we assume that H = H 0 = Hi + .) This figure 
clearly shows the importance of line (2-4), since it has a 
high likelihood of initiating four subsequent outages, which 
can, in turn, propagate additional outages. However, (2-4) only 
appears as a initiating event, as indicated by the fact that this 
node has zero in-degree; other outages do not result in the 
outage of (2-4). As a result, modifying the probability of (2-4) 
failing endogenously (within a cascade) will not have an effect 
on cascade sizes. On the other hand, the failure of either (2-5) 
or (4-5) can propagate as many as three additional outages, 
making these components candidates for potential upgrades. 


C. Polish test case results 

To test these ideas for a larger network, we computed H 0 
and Hi + from f[k\i,Q\, f[k\i , 1+] and g\j\i\, and studied the 
properties of the resulting influence graph. Figure [9]4 shows 
Hi + for the Polish test case. While the detailed structure is 
difficult to visualize, what is clear is that the influence graph 
has a topological structure that is distinctly different from 
that of the underlying physical infrastructure of buses and 
transmission lines. We argue that this difference in structure at 
least partially explains the substantial differences between the 
vulnerability implications that one obtains from power grid 
simulations and from simple topological metrics computed 
from the physical network, as observed in (33). 

In order to test our metric for component criticality, we 
computed ay for each of the n = 2896 branches. To do so all 
of the initiating event probabilities were set to p^o = 1/8760, 
based on the assumption that all components outage at a rate 
of about 1 outage per year. Note that this assumed outage rate 
has no effect on our conclusions about which component is 
the most critical, since the metric in ( [23} is scaled uniformly 
by po. Secondly, we computed the perturbations So and Si 
based on an assumed 50% reduction in the propagation rate. 

Figure [9^ shows the empirical probability distribution for 
the resulting criticality metric. This distribution clearly shows 
a heavy-tailed (nearly power-law) pattern: while the vast 










Fig. 9. (A) Illustration of the combined Hi_|_ for the Polish test case. Blue nodes/links show the physical power grid, and red nodes/links show the influences 
among branch outages. Link thicknesses indicate power flow magnitudes (blue) and influence probabilities (red). Note that for graphical clarity very small 
influences are not depicted in this figure. (B) The probability distribution of the metric oli for the same test case. (C) The risk from cascading blackouts 
initiated by n — 2 contingencies in the original system before and after increasing flow limits for the ten most critical lines. 


majority of the potential upgrades have little-to-no effect 
(ay < 10 -7 for 83% of the components), a few components 
have nearly three orders of magnitude greater impact than this. 

To evaluate the extent to which this information could be 
useful in a planning context, we performed the following 
calculation. We took the ten lines that appeared to be most 
critical from ay and then doubled their flow limits used in 
the cascading failure simulation, without changing the pre¬ 
contingency dispatch or power flows. Something similar to 
this type of increased flow limit could be accomplished by 
disabling backup (e.g., Zone 3) relaying systems for these 
critical lines, by replacing simple distance relays with more 
sophisticated current-differential relaying systems, or by re- 
conductoring. 

After increasing the line flow limits, we re-simulated the 
entire set of n — 2 contingencies and recomputed the cascading 
failure sizes. While this change does not substantially change 
the frequency of small cascades, the impact on the frequency 
of very large cascades, which have the greatest impact on 
blackout risk, is dramatic. The probability of a cascade that 
includes 50 or more outages goes down by 94% in the 
modified system. As shown in Fig. [9p, this relatively small 
modification reduces the risk of large cascading blackouts 
(those resulting in 5% or more load shedding) by about 80%. 

Another way to compare these two cases is to look at the 
propagation rate Ai+ in the system before and after modifica¬ 
tion. In the original data the propagation rate was Ai+ = 0.92. 
In the modified system, the propagation rate is reduced to 
Ai+ = 0.79. This relatively small difference can have a large 
impact on the probability of large cascades. This is typical 
cascading behavior. For example, a simple branching process, 
with one initiating outage and 2896 components, has proba¬ 
bility 0.05 of producing a cascade of size 50 or larger at A = 


0.92. If this rate is reduced to A = 0.79, the probability of rela¬ 
tively small cascades is not changed much, but the probability 
of a cascade of 50 or more is reduced by 85% and the prob¬ 
ability of a cascade of size 100 or more is reduced by 96%. 

It is important to note that this method of finding critical 
upgrades differs substantially from conventional contingency 
ranking approaches. Whereas contingency ranking seek to 
identify components that would have substantial impact on 
the network if they occur as an initiating outage, our method 
seeks to find components that are important if they fail during 
a cascade. Regardless of this difference, one might conjecture 
that standard contingency ranking methods, such as the per¬ 
formance index approach in f34l . might work equally as well, 
since they can be used to find components that substantially 
impact system loading levels. To test whether this was indeed 
the case, we computed the performance index from ll34l for 
each transmission branch in the Polish test case and compared 
the result to a from (24). The two metrics showed only a 
weak correlation (p = 0.2), and a was much more effective 
in selecting components for upgrades. Specifically, we used the 
performance index method to identify 10 branches for upgrade 
(doubling the flow capacity as before), and then simulated the 
614 n — 2 contingencies that produced at least 10 dependent 
transmission line outages in the re-modified case. While the re¬ 
modified case also produced smaller cascades relative to the 
original data, the influence graph method produced a much 
larger reduction in cascade sizes (see Table 0- 


V. Discussion 

These results suggest that the influence graph approach 
can be used to simulate cascades that are statistically similar 
to those produced by an engineering simulator, and (more 
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TABLE II 

Average cascade sizes before and after upgrading 10 lines 

SELECTED FROM EITHER THE PERFORMANCE INDEX (PI) OR INFLUENCE 
GRAPH (IG) METHODS 



Before 

After, PI 

After, IG 

Load shed 

5910 MW 

3682 MW 

1324 MW 

Dependent outages 

55.8 

42.5 

14.1 


usefully) to quickly identify critical components and upgrades 
in a large power network. However, it is important to note that 
the influence graph is a statistical model, and thus includes as¬ 
sumptions that one should be aware of. Different assumptions 
are needed for the formulation, construction and usage of the 
model; the following paragraphs explain these assumptions. 

The influence graph formulation at the beginning of Sec¬ 
tion II makes the standard branching process assumptions of 
independently generating the number of failures in the next 
generation from each of the failures in the current generation. 
The offspring distributions / are assumed to be Poisson with 
parameters Athat can vary with component i and with 
generation m. The model further assumes that the particular 
component failures that fail in the next generation can be 
modeled probabilistically as described in (3) for a single 
component failure in a generation, and that multiple failures in 
a generation propagate independently according to (3). These 
assumptions are also usefully discussed in (25) . In addition, 
the model assumes that multiple outages in a generation each 
propagate independently to the next generation. (This is a 
standard and surprisingly successful assumption in applying 
branching processes.) We acknowledge that outages can and 
do, in real power systems, mutually interact in producing out¬ 
ages in the next generation, but the experience with neglecting 
the branching process assumption and assuming independence 
in power systems [35), G3 (and other subjects [251) is that 
good predictions of the total number of outages can neverthe¬ 
less be made. In power systems, Refs. (35), (27) validate the 
branching process assumption for the purpose of predicting the 
total number of line outages using real line outages. Branching 
processes can match the distribution of number of simulated 
line outages simulated by the OPA simulation in (28) and 
load shed simulated by the OPA simulation (36). There is 
also a match for the distribution of load shed for the TRELSS 
simulation in one case of an industrial system of about 6250 
buses |[36l . Moreover, branching processes can analytically 
approximate CASCADE, a high-level probabilistic model of 
cascading 03 that explicitly models the additive effect of 
multiple line outages at each stage. 

To estimate the parameters of the model (Section II.A), 
one needs to decide on a method for the division of the 
outages in each cascade into generations. If there are multiple 
outages in a generation, then the offspring of each outage 
are approximated by dividing by the number of outages. An 
important assumption required for any statistical technique 
is that sufficient data was simulated for the estimation of 
parameters. The detail in representing the cascading, such as 
how much it depends on the cascade generation, must be 
traded off against this requirement for sufficient data. 

In order to generate cascades with the influence graph in 


Section III, one needs an assumption about how to draw from 
/ and g. There are several possibilities here. For example, 
one could draw to obtain precisely the number of outages 
indicated by the draw from /, but this would require adjusting 
the probabilities in g to reflect the fact that some components 
are already outaged. To avoid this, we assume that the draw 
from / indicates the number of times one should draw from g. 
Since no adjustments to were made to g during this process, 
the result is that some components can be marked for failure 
multiple times. Since real transmission components cannot 
outage multiple times without restoration, we count each failed 
component only once. 

Finally, in order to perform the linear algebra in (16)-(23) 
and hence identify critical elements in Section IV, further 
approximations are needed. Higher order probabilities are 
neglected to obtain the matrix multiplication in (15), and we 
need to allow the system to count the multiple failures of 
some components. In addition, in order to allow (19) we need 
to assume that Hi + does not change over the course of an 
arbitrarily long cascade. 

Our approach to estimate the parameters of /, which de¬ 
scribe the propagation of outages, is based on standard Harris 
estimators from the theory of branching processes 123 . We use 
a rather straightforward approach to estimate the parameters 
of g , which describe the network structure and the interactions 
between outages. More sophisticated approaches may well 
significantly improve the performance of this estimation in 
future work, such as starting from Bayesian priors or adapting 
graph estimation algorithms from machine learning (38) . 

There is a need for future work that evaluates each of these 
assumptions in detail, and works to identify ways to relax 
these assumptions without overcomplicating the model. 

VI. Conclusions 

This paper presents a new approach to cascading failure 
risk analysis, in which we transform massive amounts of data 
from many cascading failures into an “influence graph” that 
describes the many ways that cascades propagate within that 
system. We check that the (much simpler) influence graph 
conforms to the original data by comparing the distribution of 
cascading outage sizes from a cascading failure simulator to 
the distribution of cascade sizes generated from the influence 
graph. The two approaches produce remarkably similar cas¬ 
cade size distributions. In addition, we derive a method with 
which the influence graph can be used to quickly identify 
component upgrades that can have a significant impact on 
cascading failure risk. 
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